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Abstract 

Background: Down syndrome (DS; trisomy 21) is the most common genetic cause of mental retardation in the 
human population and key molecular networks dysregulated in DS are still unknown. Many different experimental 
techniques have been applied to analyse the effects of dosage imbalance at the molecular and phenotypical level, 
however, currently no integrative approach exists that attempts to extract the common information. 

Results: We have performed a statistical meta-analysis from 45 heterogeneous publicly available DS data sets in 
order to identify consistent dosage effects from these studies. We identified 324 genes with significant genome- 
wide dosage effects, including well investigated genes like S0D1, APP, RUNX1 and DYRK1A as well as a large 
proportion of novel genes (N = 62). Furthermore, we characterized these genes using gene ontology, molecular 
interactions and promoter sequence analysis. In order to judge relevance of the 324 genes for more general 
cerebral pathologies we used independent publicly available microarry data from brain studies not related with DS 
and identified a subset of 79 genes with potential impact for neurocognitive processes. All results have been made 
available through a web server under http://ds-geneminer.molgen.mpg.de/. 

Conclusions: Our study represents a comprehensive integrative analysis of heterogeneous data including genome- 
wide transcript levels in the domain of trisomy 21. The detected dosage effects build a resource for further studies 
of DS pathology and the development of new therapies. 



Background 

Down syndrome (DS) is the most frequent genomic 
aneuploidy with an incidence of approximately 1 in 700 
live-newborn [1] resulting from the presence of an extra 
copy of human chromosome 21 (HSA21). DS is charac- 
terized by a complex phenotype with features that are 
not fully penetrant. The most frequent manifestations, 
which are virtually always present, include mental retar- 
dation, morphological abnormalities of the head and 
limbs, short stature, hypotonia and hyperlaxity of liga- 
ments. Other features occur with less frequency such as 
organ malformations, particularly of the heart (50% of 
DS newborns), several types of gastrointestinal tract 
obstructions or dysfunctions (4-5% of DS newborns), 
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increased risk of leukaemia (20 x higher compared to 
the normal population), and early occurrence of an Alz- 
heimer-like neuropathology [2,3]. DS has been investi- 
gated with multiple functional genomics studies aiming 
to understand the molecular basis underlying the var- 
ious aspects of the disease [4-7]. 

The most commonly accepted pathogenetic hypothesis 
is that the dosage imbalance of genes on HSA21 is respon- 
sible for the molecular dysfunctions in DS, meaning that 
genes on the triplicated chromosome are overexpressed 
due to an extra chromosome 21, as demonstrated for 
selected genes like SOD1 and DYRK1A [8]. Recent global 
transcriptome studies with microarrays, however, have 
generated a more complex picture in the sense that not 
all HSA21 genes have an elevated expression level as 
expected [9,10]. An alternative hypothesis is that the phe- 
notype is due to an unstable environment resulting from 
the dosage imbalance of the hundreds of genes on HSA21 
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which determines a non-specific disturbance of genomic 
regulation and expression. The significantly higher inter- 
individual variability in DS, as compared to euploid, indivi- 
duals supports this hypothesis [11]. Moreover, the two 
hypotheses could be coexistent [3]. In both hypotheses it 
is understood that besides alterations of gene expression 
of HSA21 genes there are numerous genome -wide effects 
that lead to the dysregulation of many non-HSA21 genes 
through molecular pathways and interactions. 

Many studies on the transcriptome and proteome 
levels have been conducted to understand the causal 
relationship between genes at dosage imbalance and DS 
phenotypes [12]. Gene expression profiles have been 
analysed from DS fetal [13] and adult human tissues [6]. 
Additionally, two classes of mouse models [14] have 
been developed for investigating the molecular genetics 
of DS, either mouse models with partial trisomies of the 
syntenic regions of HSA21 in mouse chromosomes 10, 
16 or 17, such as Tsl6 [15], Ts65Dn [16] and TslCje 
mice [17], or transgenic mice for specific genes such as 
SOD1 [18]. Studies of gene expression profiles in human 
DS samples and mouse models have shown high gen- 
ome-wide variability [11,19-22]. Furthermore, differences 
due to the applied experimental platforms, specific tis- 
sues, developmental stages or the triplicated segments 
under study introduce a high variation to the assessment 
of genome-wide effects of DS. Here, integrative and 
comparative studies are pivotal for the analysis of the 
complex nature of gene expression and regulation in DS 
at a more general level [2,23]. 

Meta-analysis was proven to be a valid strategy to 
extract consistent information from heterogeneous data, 
in particular with respect to complex phenotypes for 
example cancer [24], Alzheimer [25] and type-2 diabetes 
mellitus [26]. The purpose of meta-analysis is to com- 
pensate experiment-specific variations and to reveal con- 
sistent information across a wide range of experiments. 
To date, such a meta-analysis of DS data is missing. 

In this paper we describe a comprehensive meta-ana- 
lysis from 45 different DS studies on human and mouse 
on the transcriptome and proteome level including 
quantitative data such as Affymetrix microarrays, RT- 
PCR and MALDI studies as well as qualitative data such 
as SAGE and Western blot analyses. We applied an 
established computational framework [26] and identified 
324 genes with consistent dosage effects in many of 
these studies. As expected, we observed a high fraction 
of HSA21 genes (N = 77) but also a large amount of 
non-HSA21 genes (N = 247). Besides well investigated 
genes in the context of DS we detected a significant 
proportion of novel ones (N = 62). The 324 genes were 
further investigated using functional information, mole- 
cular interactions and promoter analysis revealing over- 
represented motifs of four transcription factors: RUNX1, 



E2F1, STAF/PAX2 and STAT3. In order to test the rele- 
vance of the 324 genes for more general brain pheno- 
types we used independent publicly available data on 
cerebral pathologies not related to DS and identified a 
subset of 79 DS genes that were differentially expressed 
in these studies. The detected dosage effects can be 
used as a resource for further studies of DS pathology, 
functional experiments and the development of thera- 
pies. All data have been agglomerated and made avail- 
able through a web server that tracks results of the 
meta-analysis http://ds-geneminer.molgen.mpg.de/ and 
that enables the community to validate any gene of 
interest in the light of the experimental data. 

Results 

Genome-Wide Dosage Effects 

Genome-wide dosage effects were computed with the 
numerical scoring method described in Material and 
Methods. In total, 45 case-control experiments were 
interrogated (Additional file 1, Table SI), the alteration 
for each gene between the trisomic and normal states 
was scored in each experiment, gene scores were sum- 
marised across all experiments and the significance of 
the summarised scores was judged with a Bootstrap 
approach. This procedure resulted in a cut-off score 
value of 3.67 and identified 324 genes as being predomi- 
nantly affected by DS. The thirty genes with the highest 
dosage effects, either on HSA21 or on other chromo- 
somes, are listed in Table 1. The entire gene list is given 
in Additional file 1, Table S2. 

The meta-analysis identified genes that showed consis- 
tent changes in many of the different experiments rather 
than genes that were affected by a single (or few) experi- 
ments) (Figure 1A). This is an important fact since, for 
example, different mouse models have different coverage 
of triplicated HSA21 genes, and, thus, might introduce 
model-specific bias [14]. The consistency of the dosage 
effect was measured for each gene with an entropy cri- 
terion (see Materials and Methods) and Figure 1A 
reveals a strong preference for the selection of high- 
entropy genes. Highest scores were assigned to HSA21 
genes (Figure IB) what indicates that the meta-analysis 
scores reflect the effect of an extra chromosome 21 on 
gene expression (Table 1). While proportionally most 
dosage effects were identified for HSA21 genes (77 out 
of 324), the majority of genes (247 out of 324) was 
located on other chromosomes highlighting the gen- 
ome-wide impact of DS (Figure 1C). 

Genome-wide dosage effects underlined the severe 
phenotypic consequences of DS caused by genes with a 
major role in human development (Additional file 2, 
Figure SI). Of the 247 non-HSA21 genes, 72 were asso- 
ciated with development, in particular with respect to 
organ development (62 genes, GO:0048513), tissue 
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Table 1 Top thirty DS dosage effects on A) HSA21 and B) other chromosomes 
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Table 1 Top thirty DS dosage effects on A) HSA21 and B) other chromosomes (Continued) 
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development (34 genes GO:0009888) and cell develop- 
ment (30 genes, GO:0048468). Amongst these genes 
were known interactors of HSA21 genes, for example 
REST (RE 1 -silencing transcription factor). REST modu- 
lates expression of genes encoding fundamental neuro- 
nal functions including ion channels, synaptic proteins 
and neurotransmitter receptors and has been linked to 
an inherited form of mental retardation. Recently, Can- 
zonetta et al. [5] demonstrated that the region capable 
of affecting REST levels, in both mouse and human 
cells, could be assigned to the DYRK1A locus on HSA21 
which was found among the top-scoring HSA21 genes 
(Table 1). 

TXNIP (thioredoxin interacting protein) had the high- 
est dosage effect (8.79) of all non-HSA21 genes. It has 
weak association with DS yet (through S100B [27]) but 
could play a major role for several DS phenotypes. It is 
a key signalling molecule involved in glucose homeosta- 
sis [28], cardiovascular homeostasis [29] and leukaemia 
[30]. 

Enrichment of genomic location with respect to the 
324 genes was observed in regions of HSA21 and the 
respective syntenic regions on mouse chromosomes 16, 
17 and 10 (Additional file 3, Figure S2). Moreover, in 
the human genome, additional enrichment on chr3q24 
was computed containing the genes GYG1 (glicogenin), 
PLOD2 (involved in bone morphogenesis), PLSCR4 and 
CHST2 (involved in inflammatory response in vascular 
endothelial cells). 

Dosage Effects on HSA21 

Proportionally HSA21 contributed mostly to the 
detected dosage effects (Figure 1C). On the other hand, 
it is remarkable that only a third of all HSA21 genes (77 
out of 255 studied here using the Ensembl genome 
annotation [31]) showed consistent effects across the 
different experiments (see also Discussion). While 57 



genes had a positive score below the significance thresh- 
old of 3.67 indicating relevance with respect to specific 
experiments only, 121 genes had a score near zero indi- 
cating that dosage effects were either compensated or 
not detected with the selected experimental data (Figure 
IB). 

HSA21 dosage effects included, for example APP 
(beta-amyloid precursor protein) involved in senile pla- 
que formation in DS and Alzheimer's disease [3], SOD1 
(superoxide dismutase 1), a key enzyme in the metabo- 
lism of oxygen-derived free radicals [3], DYRK1A (dual- 
specificity tyrosine- (Y) -phosphorylation regulated kinase 
1A) involved in neuroblast proliferation, crucial for 
brain function, learning and memory [32], RUNX1 
(runt-related transcription factor 1) which plays a criti- 
cal role in normal hematopoiesis [33], or GABPA (GA 
binding protein transcription factor, alpha subunit 60 
kDa) encoding a DNA binding domain with a huge vari- 
ety of targets including genes from different cell/tissue 
specificities and functions [34]. HSA21 genes were 
mostly up-regulated in gene expression studies (69 out 
of 77) with the exception of eight genes that were either 
variable or down-regulated (SLC5A3, MRPS6, B3GALT6, 
CBS, KCNJ6, KCNJ1S, CLDN14, COL 18 A 1). Possible 
explanations for this observation might be tissue-specifi- 
city of gene expression as in the case of MRPS6 which 
was mostly up-regulated in brain samples and down- 
regulated in other tissues like heart or kidney, or differ- 
ences in human and mouse gene expression as in the 
case of CBS which was up-regulated in human but 
down-regulated in mouse experiments what might be 
caused by differential tissue specificity of the ortholo- 
gous mouse gene [35]. 

Three genomic regions on HSA21 were enriched with 
the significant genes using the MSigDB_cl positional 
database: chr21q22, chr21q21 and chr21qll, located on 
the q-terminal arm (Figure ID). This contradicts the 
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Figure 1 Characterization of dosage effects. A) Entropy (Y-axis) vs. score of dosage effect (X-axis) for all genes, B) Histogram of scores for all 
255 HSA21 genes accessible with the experiments under study, C) Distribution of genomic locations of the 324 candidate genes, D) Cytogenetic 
location of 77 HSA21 genes that show significant dosage effects for all experiments (blue line). Additionally, the same meta-analysis approach 
has been conducted with human (green line) and mouse (red line) data separately. The yellow line plots the relative number of HSA21 genes 
within each band (gene density). Y-axis shows percentage of significant genes with respect to all genes annotated for the chromosomal band. 
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hypothesis that a single region on HSA21 could be 
responsible for the molecular and phenotypic conse- 
quences of DS with only a few responsive genes [36,37]. 
Rather our findings support studies that identified more 
than one HSA21 region causative for DS phenotypes so 
that the dosage effects were not uniformly distributed 
along the chromosome but rather enriched in certain 
regions on HSA21 similar to the results in [38,39]. 

Functional Annotation Using Gene Enrichment Analysis 

Functional annotation of biological pathways was 
retrieved from the ConsensusPathDB [40], a meta-data- 
base that summarizes the content of 22 human interac- 
tion databases. A total of 1,695 pre-defined pathways 
were screened with the 324 genes using gene set enrich- 
ment analysis [41]. A total of 277 pathways were found 
significantly enriched (family-wise error rate (FWER) 
<0.01) of which several pathways were associated with 
neurological and neuropathological processes (Table 2). 
These pathways referred mainly to (i) neurodegeneration 
(e.g. Huntington's disease, Alzheimer's disease or Parkin- 
son's disease) and (ii) defects in synapsis (e.g. Axon gui- 
dance, NGF signaling). Furthermore, the results 
emphasized the role of tyrosine-kinase receptors in DS 
pathology (for example P7S(NTR)- mediating signalling 
or NGF signalling via TRKA) which interact directly with 
BDNF (brain-derived neurotrophic factor). Moreover, 
our results showed gene dosage effects caused either 
directly by genes located on HSA21 (e.g. SOD1, APP, 



DONSON, TIAMl, COL6A2, ITSNl and BACE2) or 
indirectly by HSA21 interactors, highlighting the intrinsic 
complexity of the DS pathology. For example, PIK3R1 
de-regulation impacts on many of these pathways and is 
a direct interactor of IFNAR1, a significant DS gene. A 
similar effect can be observed for TPJ1A that has interac- 
tions with HSA21 genes JAM2 and CDLN8 both showing 
consistent dosage effects (cf. Figure 2A). 

Dosage Effects on Transcriptional Regulation 

Dysregulation of transcriptional regulation is widely 
reported in DS [34]. Among the 324 significant genes 
were 13 transcription factors (TFs) (PSIP1, RBPJ, TCF4, 
HES1, ETS2, BACH1, RUNX1, GABPA, SNAI2, REST, 
LITAF, EGR1, FOS), 6 TFs (PSIP1, HOXC8, DLX5, 
HIVEP3, ZNF187, ATF6) had significant enrichment of 
their targets as retrieved by the TRANS FAC [42] database. 
Additionally, 57 TFs had significant enrichment of their 
interacting proteins when judged with physical interac- 
tions retrieved from the ConsensusPathDB [40]. In total, 
70 different TFs were identified as being (directly or indir- 
ectly) affected by dosage imbalances. The list of TFs and 
their associated functional categories is given in Additional 
file 1, Table S3. GO categories indicate a broad impact of 
transcriptional regulation for neurological development, 
the central nervous system development (RUNX1 and 
TPS 3), nervous system development (DLXS, FOS, HES1, 
STAT3 and EP300), axonogenesis (DLXS, NOTCH1 and 
CREB1), neuron differentiation (HOXC8, NOTCH1 and 



Table 2 Enriched neuropathological pathways 



PATHWAY (Source Database) 


Pathway 
size 


P- 
value 


FWER 
P-value 


Genes on 
HSA21 


HSA21 Interactors 


Others 


HUNTINGTONS DISEASE (KEGG) 


159 


0 


0 


SOD1; 
DONSON 


REST 


BDNF; SOD2 


ALZHEIMERS DISEASE (KEGG) 


147 


0 


0 


APP; BACE2; 
DONSON 


PPP3CA; GSK3B 


CAPN2 


SIGNALLING BY NGF (REACTOME) 


209 


0 


0 


ITSNl; 
TIAM1 


PIK3R1; GSK3B 


RPS6KA2; RAP1A; KRAS 


AXON GUIDANCE (REACTOME) 


256 


0 


0 


COL6A2 


GSK3B;COL1A1; 
COL1A2; COL4A1; 
COL4A2 


COL5A2; DPYSL3; RPS6KA2; LAMB1; 
COL3A1; COL5A1; ALCAM; KRAS 


PARKINSONS DISEASE (KEGG) 


105 


0 


0 


DONSON 




UBE2G2 


P75(NTR)-MEDIATED SIGNALING (PID) 


68 


0 


0 


APP 


PIK3R1 


BDNF 


NOTCH (NETPATH) 


61 


0 


0 


APP 


PIK3R1; GSK3B 




NEUROTROPHIN SIGNALING PATHWAY 
(KEGG) 


121 


0 


0 




PIK3R1; GSK3B 


BDNF; RPS6KA2; RAP1A; KRAS 


NGF SIGNALLING VIA TRKA FROM THE 
PLASMA MEMBRANE (REACTOME) 


127 


0 


0 




PIK3R1; GSK3B 


RPS6KA2; RAP1A; KRAS 


MEMBRANE TRAFFICKING (REACTOME) 


87 


0 


0 




TJP1 


GJA1; COPG 


NEUROTROPHIC FACTOR-MEDIATED TRK 
RECEPTOR SIGNALING (PID) 


60 


0 


0 


TIAM1 


PIK3R1 


BDNF; RAP1A; KRAS 


EP0 SIGNALING (INOH) 


180 


0 


0 




PIK3R1; GSK3B 




CDC42 SIGNALING EVENTS (PID) 


68 


0 


0 


TIAM1 


PIK3R1; GSK3B 


EPS8; YES1 


L1CAM INTERACTIONS (REACTOME) 


93 


0 


0 






LAMB1; ALCAM 1; RPS6KA2 
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Figure 2 Molecular interactions of HSA21 genes. A) Interactions of HSA21 genes (red) with non-HSA21 genes (other colours). Same colours 
of the gene nodes refer to the same chromosome. B) Example of consistent down-regulation of DNAJB1 as a consequence of HSA21 imbalance 
visualized in the web browser. 



RUNX1), negative regulation of neuron differentiation 
(HES1, NOTCH1 and REST) and regulation of long-term 
neuronal synaptic plasticity and learning or memory 
(EGR1 and JUN). Other prominent categories refer to 
organ development (RBPJ, ETS2, GABPA and SNAI2) and 
stress response (ATF6, FOS and RELA). 



We further analyzed the promoter sequences of the 
324 genes for enrichment of transcription factor binding 
sites using the AMADEUS software [43]. Significant 
enrichment was computed for 4 TF motifs, E2F1, 
RUNX1, STAF/PAX2 and STAT3 (Table 3). Enrichment 
was evident for RUNX1, which is among the most 
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studied genes implicated in DS. The implication of E2F1 
in DS was also previously reported [34] and could be 
responsible for impaired cell proliferation documented 
for hippocampus, cerebellum and astrocytes of DS 
mouse models. 

Dosage Effects and Molecular Interactions 

Molecular interactions among the 324 significant genes 
on HSA21 and on other chromosomes exhibited a com- 
plex network supporting the important role of physical 
interactions as transmitter of dosage effects (Figure 2A). 
The consequences of HSA21 triplication on the interact- 
ing genes was fairly stable as Figure 2B demonstrates. 
For example, DNAJB1 (DnaJ (Hsp40) homolog, subfam- 
ily B, member 1) and PPP3CA (protein phosphatase 3, 
catalytic subunit, alpha isozyme, data not shown), both 
interacting with SOD1, were consistently and signifi- 
cantly down-regulated in the human microarray experi- 
ments as the fold-changes and P-values indicate. 
Opposite trends were observed for TJP1 and RHOQ. 

Assessing General Relevance of DS Dosage Effects for 
Neurological Processes 

We were further interested in identifying, among the 324 
genes, those which were relevant for other brain disor- 
ders. To achieve this, we interrogated 19 independent 
data sets derived from publicly available microarray data 
(Additional file 1, Table S4). These studies followed het- 
erogeneous research questions on different cerebral 
pathologies and identified a total of 623 differentially 
expressed genes. Gene set enrichment analyses [41] with 
the 324 genes and the corresponding lists of differentially 
expressed genes were significant for 10 of these 19 stu- 
dies with 79 overlapping genes (Figure 3A). Furthermore, 
we used the HSA21 database http://chr21.molgen.mpg. 
de/hsa21[4], a resource of RNA in situ hybridizations in 
postnatal mouse brain sections, in order to provide inde- 
pendent supporting evidence of brain expression of these 
79 genes as shown for example for BACH1 (basic leucine 
zipper transcription factor 1) and TTC3 (tetratricopep- 
tide repeat domain 3) (Figure 3B and 3C). 



Additionally, we investigated the expression patterns 
of the 79 genes across the DS microarray experiments 
used for this meta-analysis and could identify brain- 
related signatures, for example, a clear up-regulation in 
brain tissues for the cluster containing C14orfl47, IVSN- 
S1ABP, B2M, TPJ1, SPARC, CTGF, COL4A1 and FSTL1 
(Figure 3D). 

Novel Dosage Effects 

To identify DS-relevant "novel" dosage effects we excluded 
from the 324 genes (i) HSA21 genes, (ii) genes that inter- 
acted with HSA21 genes, as well as (Hi) genes that were 
associated with DS in the literature (Table 4). Remaining 
candidates (N = 62) comprised BDNF-related genes (SST), 
MAPK-pathway genes (KRAS, IGF1R, GNG11 and 
RAP1A), genes related with leukemia (SFRP1) and Rho- 
Proteins (DHCR7 and RAB21). SST was found as co- 
expressed in previous studies with TAC1 [44] which is 
also significant in our meta-analysis and both showed a 
strong correlation across DS studies (Figure 4A). 

Novel candidates are associated with neurodegenera- 
tive disorders including Alzheimer's disease (VSNL1), 
prion disease (SCRG1, HSPH1, HSPAS (Figure 4B) and 
CTR9) and age-related degeneration (GAS6 and 
GNG11). Moreover, candidates could explain evident DS 
features (Additional file 1, Table S5): (i) genes related to 
neurogenesis and neurite outgrowth (LPAR1 [45], 
LIN7C, JARID2, GREM1, SERPINE2, IGFR1 and 
SPOCK1) that could be related with mental retardation 
or cognitive impairment, (ii) genes involved in synapsis 
(AGT, KRAS, ATP1A2, GNAI2, SST and LIN7C) (Hi) 
cytoskeletal related proteins (KANK1 [46]; Figure 4C), 
CKAP2, CKAP4, HAT1, NEK7 and VAMPS), (iv) macu- 
lar degeneration genes [47] or genes (HTRA1 and 
EFEMP1) associated with age-related visual problems 
[48], (v) genes (AGT, CNN3, FBN1, RBPJ, PON2, 
POSTN, RAP1A, WNK1 and STK39) that were related 
with cardiac impairments and could be candidates to 
explain this DS characteristic [49], and (vi) genes related 
with cancer (BTBD3 [50], DNAJB4 [51], FIBP [52] and 
GSTZ1 [53]) [54]. 



Table 3 Enriched TFBSs 



TF 



Description 



Cromo-some 



P-Value 



Binding motif 



Strand 



E2F transcription factor 1 
runt-related transcription factor 1 
paired box 2 



E2F1 

RUNX1 

STAF/ 
PAX2 

STAT3 signal transducer and activator of transcription 3 
(acute-phase response factor) 



chr20 
chr21 
chr10 
chr17 



9.3*1 0" 1 8 [C/t] [C/a] [G/c]C[c/a] [C/g] [G/c] 

[C^G[G/c]A 

4.0*1 0" 18 [C/a/t][T/a/g][G/C]{A}[G/c]{A} 
T[C/A][G][C/a/t/g] 

8.4*1 0" 18 [A/g][A/g]A[C/T/a]U/g/a]|T/c] 
[C/t] [C/g] [C/a] 

8.4*1 0" 17 GAA[A/T|[(yT]G[(yT][<7g/t] 
[A^[C^T/g] 



Binding motifs have been represented using the IUPAC nomenclature and incorporating lower case for low frequency bases. 
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Brain studies DS meta-analysis 




Figure 3 Brain-related dosage effects. A) Venn diagram showing the overlap of the 324 significant genes with 623 genes identified by 
independent mouse studies related to brain phenotypes; B) RNA in situ hybridisations of BACH '1 in postnatal mouse embryonic brain slices. C) In 
situ hybridisation of TTC3 in the same tissue. Images kindly provided by the HSA21 consortium ([4]; http://chr21.molgen.mpg.de/hsa21). D) 
Hierarchical clustering of 79 genes related to non-DS general brain disorders with the DS gene expression data sets. Clustering was performed 
with the J-Express 2009 software using Pearson correlation as similarity measure and complete linkage as update rule. 



These examples show that the meta-analysis approach 
identified multiple additional genes that might be involved 
in DS pathology. In order to enable the community to 
check any particular gene of interest for DS relevance in 
the studies under analysis, we have agglomerated all infor- 
mation of the meta-analysis into a WEB-interface http:// 
ds-geneminer.molgen.mpg.de/. Examples of possible views 
and information are shown in Figure 4. 



Discussion 

The statistical meta-analysis approach was described 
previously by Rasche et al. [26]. The score computed 
with meta-analysis correlates with entropy (Figure 1A) 
indicating the ability to identify general dosage effects 
across many experiments that might be of more pheno- 
typic relevance than very specific ones. Additional file 4, 
Figures S3A and B provide an overview of the different 
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Table 4 Novel DS dosage effects 



Ensembl 


HUGO 


Score 


Entropy 


Chromo-some 


Start 

[JUbl IIUI 1 


End 

pUbl IIU! 1 


Band 


CNV 


itk i c r~ r\r\r\r\r\ 1001 in 

bNbGOOOOOl bbl 1 0 


POSTN 


8.301 


2.437 


chii 3 


0 0 1 0 r Tn 

bol 36/22 


0O1 "70(00 1 

bol /zyol 


q 1 3.3 


YES 


CMcrnnnf\ni ocm n 
bNbGOOOOO 1 bb9 I y 


bbKrINbz 


5.904 


3.048 


chr2 


zz4oby/Db 


oo/inr\/ino^ 
zz4y040bo 


q36.1 




CMC^nnnnm nonQ 
bNbGUUUUU I / Zo^d 


UHLK/ 


5.883 


3.441 


chrl 1 


"711/1 C/1 C7 

/ 1 1 4b4b/ 


"711 Cfl/I "7"7 

/ I I by4// 


q 1 3.4 




CMC^nnnnm ~>h~iaa 

bNbGOOOOO I ib/44 


AGT 


5.467 


2.799 


chii 


oonoooo^n 

2b0oboZ6y 


0 0 no rnn a 0 

Zb0ob004b 


q42.2 




CMC^nnnnm mi "7/" 

bNbGOOOOOl by I /o 


CSRP1 


5.424 


3.1 36 


chrl 


On 1 /iri/TO 

Z014bz6bo 


on 1 a 10 c 0 a 

z014/0bo4 


q32.1 




CMC^nnnnm in^ni: 

bNbGOOOOOl /ooyb 


KGTD1 2 


5.344 


2.373 


chrl 3 


a n a~> "\ 0 

//4b4bl 2 


a r r\c ac\ 

//460b40 


q22.3 




CMcrr\nr\fini oono~7 
bNbGOOOOO I obOo/ 


GAS6 


5.1 29 


2.904 


chrl 3 


1 1 /icooco/i 
1 I4bzbbz4 


11/1 C/C7A/1 

1 I4bo/04o 


q34 




CMCrnrinnni ca 1 n^ 
bNbGOOOOO 1 64 I Oo 


SCRG1 


5.1 27 


2.728 


chr4 


1 1 a onnonn 
1 /4b0yb00 


1 "7 A O 1f\£ 1 "7 

I /4bz0o I / 


q34.1 




CMC/^nnnnm ^^noo 
bNbGOOOOO I ooyzb 


GREM1 


5.073 


1 .486 


chrl 5 


0 0 n 1 m ~7c 

bbU I U I /b 


0 ono^Q"7n 
bbOzDO/0 


q 1 3.3 


YES 


itk ic^nnnnm ro~in a 

bNbGOOOOOl 6b/b4 


GYG1 


4.933 


3.129 


chr3 


1 a o~ir\r\ 100 

i4o/oyi zo 


1 /I 0"7 A C A 1 n 

l4o/4b4l y 


q24 




bNbGOOOOOl bbboO 


SEC16A1 


4.878 


2.927 


chrl 


1 1 O A C A A iCn 

I 1 34b446y 


1 1 0 a r\r\r 0 r 

1 1 b4yy6bb 


pi 3.2 
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CMCrnfinnm ^^noo 
bNbGOOOOO I ooObb 


HTRA1 


4.81 1 


3.101 


chrl 0 


10/1001H/11 

I z4zz 1 04 1 


1 T/I17/I /IT/1 

1 z4z/44z4 


q26.1 3 


YES 


CMCrnnririni /ic^oo 
bNbGOOOOO 1 4bob2 


PbK2 


4.785 


2.81 1 


chr5 


b//4yooy 


C "7 "7 C n O "7 

b//bo0o/ 


q1 1 .2 




CMCrnnfinrii 1 coon 
bNbGOOOOO I I bboO 


rrri a n i 

bhblvlP I 


4.726 


2.257 


chr2 


boOyb I Oz 


CiC1 C1 0~7/l 

bo I b I z/4 


pi 6.1 




itk i c r~ r\r\r\r\r\r\r r\^\o i 

bNbG000000602b/ 


\A/MI/1 

WNK1 


4.637 


2.765 


chrl 2 


o6zooy 


1 non/" 1 0 

1 0206 1 0 


p1 3.33 
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bNbGOOOOO I ObOOO 


i/i a a 1 1 nn 

kiaai i yy 


4.581 


0.885 


chrl 5 


0 1 r\i 1 r 0 a 

ol 0/1 6o4 
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0 1 244 I I / 


q25.1 
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I y© I ZD I Z I 
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ENSG000001 04332 


SFRP1 


3.825 


2.587 


chr8 


41119483 


41166992 


pi 1.21 




ENSG000001 16473 


RAP1A 


3.824 


2.769 


chrl 


112084840 


112259313 


p13.2 




ENSG000001 72500 


FIBP 


3.804 


3.309 


chrl 1 


65651211 


65656010 


q13.1 


YES 


ENSG000001 33703 


KRAS 


3.801 


3.338 


chr12 


25358182 


25403854 


pi 2.1 




ENSG000001 63032 


VSNE1 


3.798 


3.099 


chr2 


1 7720393 


17838285 


p24.2 




ENSG000001 34684 


YARS 


3.765 


3.431 


chrl 


33240840 


33283754 


p35.1 




ENSG000001 05854 


PON2 


3.764 


2.862 


chr7 


95034179 


95064510 


q21.3 




ENSG000001 48943 


LIN7C 


3.763 


3.033 


chrl 1 


27516124 


27528303 


p14.1 
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Table 4 Novel DS dosage effects (Continued) 
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EN SG00000 198648 


STK39 


3.722 


3.439 


chr2 


168810530 


169104651 


q24.3 




ENSG000001 00577 


GSTZ1 


3.713 


2.759 


chr14 


77787230 


77797939 


q24.3 




ENSG00000080371 


RAB21 


3.707 


3.312 


chr12 


72148658 


72181150 


q21.1 


YES 


ENSG00000136108 


CKAP2 


3.688 


2.960 


chr13 


53029495 


53050485 


q14.3 




ENSG00000066583 


ISOC1 


3.686 


2.655 


chr5 


128430442 


128449721 


q23.3 




ENSG000001 43420 


ENSA 


3.681 


3.276 


chrl 


150573327 


150602098 


q21.3 




ENSG000001 14353 


GNAI2 


3.680 


3.138 


chr3 


50263724 


50296787 


p21 .31 


YES 


ENSG00000140105 


WARS 


3.671 


2.994 


chr14 


100800125 


100842680 


q32.2 




ENSG00000018625 


ATP1A2 


3.670 


2.733 


chrl 


160085549 


160113381 


q23.2 





sources of data, including two organisms (human and 
mouse), different tissues (brain, heart and others), differ- 
ent stages of development (adult, postnatal, embryonic) 
and different mouse models (Ts65DN, TslCje, Tel). It 
is per se interesting that, in spite of such heterogeneity, 
common dosage effects could be identified at all and it 
should be highlighted that whole-genome data was fairly 
robust across experiments. Additional file 4, Figure S3D 
shows the overall correlation of the quantitative values 
of PCR and microarray values averaged from all experi- 
ments with only few genes in the non-concordant sec- 
tors of the graph (red points). 

The score used in this analysis allows detecting genes 
that could be either up- or down-regulated in different 
studies. An overview of the fold-changes for the genes 
across the different experiments is given in Additional 
file 1, Table S6. Because genes might change their 
expression level depending on the developmental state, 
tissue or because of other variables, we expected that 
this flexibility allows checking the hypothesis of random 
disturbances as well as the hypothesis of increased 
expression of HSA21 genes. We detected a clear enrich- 
ment of up-regulated genes on the q-terminal part of 
HSA21 (Figure ID and Additional file 3, Figure S2). 
However, not a single region was identified but rather 
several smaller regions on HSA21 that agglomerate a 
large amount of significant dosage effects. This finding 
was also elaborated before (Korbel et al. [38] and Lyle R 
et al. [39]) using two independent data sets to character- 
ize the molecular HSA21 regions in a set of DS-patients 
with partial duplications. 

We studied 255 HSA21 genes matched with the probe 
sets from the microarrays. Of these only 77 showed con- 
sistent dosage effects (Figure 1). While 165 HSA21 
genes had score values different from zero indicating 
response in some of the microarray studies, 90 HSA21 
genes were not responsive at all and provide evidence 
for a strong mechanism of dosage compensation. On 
the other hand, these figures could also reflect the lim- 
itation of detecting reliable fold-changes of low 



magnitude with microarray technology. Furthermore, 
experiments covered only a limited amount of tissues so 
that it is likely that some genes were missed simply 
because they were not responsive in the tissues under 
analysis. However, having brain as the dominant whole- 
genome sample source this should ensure expression of 
most of the genes. Microarray data was focused on the 
Affymetrix platform in order to reduce variance arising 
from platform inconsistencies. We have also compared 
our results with additional studies including own pre- 
vious research [9] and others [55] and found relevance 
of selected dosage effects with respect to other tissues as 
well (data not shown). Additional cross-validation was 
performed with an independent microarray data set 
[10]. These authors compared human lymphoblastoid 
cell lines derived from DS patients and normal controls 
with a custom-made HSA21 array. Yahya-Graison et al. 
[10] divided the expression ratios in four classes: class I 
and class II genes were significantly up-regulated, while 
class III and class IV genes were either compensated or 
showed variable response. Our meta-analysis revealed a 
high-degree of concordance taking into account that the 
cell model, platform and the methodology used were 
completely different. The meta-analysis scores were sig- 
nificantly higher for class I and II genes than for class 
III and IV genes (P-value <0.01, Additional file 5, Figure 
S4). 25 out of 39 class I-II genes revealed a significant 
score in our meta-analysis (75%). 

In this study we monitored molecular interactions of 
HSA21 genes that might function as drivers of dosage 
effects (Figure 2A). For example, (i) TJP1 (Tight junction 
protein ZO-1) interacts with two HSA21 genes, JAM2 
and CLDN8, (ii) FOS (FBJ murine osteosarcoma viral 
oncogene homolog) interacts with HSA21 genes ETS2, 
SUM03, RUNX1 and indirectly with ERG, (Hi) RHOQ 
(ras homolog gene family, member Q) interacts directly 
with ITSN1 and TIAM1 and indirectly with SYNJ, and 
(iv) PIK3R1 interacts directly with IFNAR1 and indir- 
ectly with IFNAR2. It should be emphasized that current 
information on molecular interactions is far from 
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Figure 4 Novel DS dosage effects visualised with the web browser. A) SST and TACl have been previously reported as acting in a complex. 
The deregulated profile of these genes correlates was shown here with the fold-change view of the web browser. B) HSPA5 is a novel gene for 
DS implicated in neurodegeneration which is also a target of the ATF6 TF whose target set was enriched with significant genes. The histogram 
displays the p-values for this gene in individual studies. C) KANK1, a gene previously related with paternally inherited cerebral palsy, shows a 
consistent trend of up-regulation in the considered studies as shown with the fold-change view of the web browser. 



complete, thus we either might miss important interac- 
tions not yet detected and/or we might count false posi- 
tive interactions due to the high error rates of current 
annotations of interactions. 

Several of the DS genes (N = 79) extrapolated to more 
general neurological phenotypes (Figure 3A). The den- 
drogram (Figure 3D) shows further interesting profiles 
of these genes in the DS samples under analysis: (i) dif- 
ferential gene expression in the cerebellum region versus 
whole "brain" or cerebrum areas which has been 
reported in other studies (e.g. Moldrich et al. [56]), (ii) 
different patterns of gene expression associated to parti- 
cular developmental stages (PO, P15 and P30); these 
changes were reported before by Dauphinot el al. [57], 
and (Hi) differences in ES studies. 

We further analyzed human and mouse studies sepa- 
rately and found 182 significant dosage effects using 
only human and 107 dosage effects using only mouse 
data. The Venn diagram in Additional file 4, Figure S3C 
clearly shows the benefit in detecting additional dosage 
effects when mixing the two species. Overlapping dosage 
effects were detected for 29 genes with both analyses 
(Additional file 1, Table S7). Results for the human and 



mouse specific analyses can be found in Additional file 
1, Tables S8 and S9. It should be noted here that com- 
parisons between human and mouse using microarrays 
are inherently difficult and have limitations since the 
probes for the orthologous mouse and human genes do 
not correspond well. Furthermore, gene expression var- 
iation is generally higher in human individuals com- 
pared to mouse inbred strains. Nonetheless, the 107 
genes found in the analysis of mouse data (derived from 
the different mouse models for trisomy 21) represent a 
core set of genes responsive across different DS mouse 
models and, thus, could be highly relevant for DS 
pathogenesis. 

In addition to genes commonly related to DS, we have 
identified novel genes that can be associated with DS 
phenotypes, in particular with neural development and 
neurodegeneration. To our best knowledge, this study is 
the first meta-analysis of genome-wide transcript levels 
along with other data domains in DS research. The 
agglomerated data can be accessed through the WEB 
server at http://ds-geneminer.molgen.mpg.de and the 
identified dosage effects are a resource for further func- 
tional testing and therapeutic development. 
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Conclusions 

We have identified a set of 324 genes with consistent 
dosage effects from 45 different experiments related to 
DS. Since the meta-analysis was enriched with brain 
experiments, we were able to detect a high fraction of 
genes related to neuro-development, synapsis and 
neuro-degeneration. Moreover, our results give more 
information about known and new pathways related to 
DS and also about 62 novel candidates. The results of 
the meta-analysis as well as the source data have been 
made accessible for the community through a WEB 
interface. 

Material and methods 

Selection and integration of DS resources 

Data sets were selected from heterogeneous technical 
platforms, different model systems (human cell lines, 
human tissues, mouse models) and different develop- 
mental stages (Additional file 1, Table SI). For each 
gene and for each source we computed a numerical 
value that measures its dosage effect. Data categories 
were either qualitative or quantitative. Qualitative data 
incorporated a total of 30 published manuscripts 
including reviews and semi-quantitative studies as well 
as two SAGE studies [21,58] and were summarised to 
one score point in order to avoid over-scoring. Here, a 
"1" referred to the case that the gene was found as DS 
relevant in one (or more) studies. Quantitative data 
from differential gene expression studies such as Affy- 
metrix microarrays, RT-PCR, MALDI and other quan- 
titatives techniques were evaluated in order to extract 
comparable information across the different studies. 
We considered Affymetrix studies that provided the 
raw data (CEL file level). Raw data were extracted 
from Gene Omnibus Expression (GEO, [59]), Array 
Express [60] or were retrieved from the author's web 
pages (in total 16 data sets including human tissues 
and four different mouse models (Ts65Dn, TslCje, 
Tel and Ts + HSA21). Furthermore, we incorporated 
18 RT-PCR and MALDI data sets for which the 
authors displayed the information for all genes under 
study (either significant or not). 

Mapping of gene IDs 

A central pre-requisiste of any meta-analysis approach is 
the consolidation of the different ID types, for example 
coming from different organisms and from different ver- 
sions of chips. We used the Ensembl database (version 
56) as the backbone annotation for all studies. IDs were 
mapped to human Ensembl gene IDs. Mapping and 
merging of the information was done within R and the 
BioConductor package. In total, information on 19,388 
ENSEMBL genes was mapped. 



Mapping SAGE IDs 

Differential expressed tags were extracted from addi- 
tional files of the studies. Identifiers (based on 
sequences) were cross-tagged with the information dis- 
played in the updating tables (SAGEmap_Hs and SAGE- 
map_Mn) from the SAGE site ftp://ftp.ncbi.nlm.nih.gov/ 
pub/sage/mappings. 

Transcriptome data pre-processing and normalization 

We incorporated only case-control studies in the meta- 
analysis in order to derive expression fold-changes. Affy- 
metrix gene chip annotations were adapted from the lat- 
est genome annotation (version 12). Affymetrix data 
were normalized with GC RMA. For transcriptome 
case-control studies three pieces of information were 
stored for each gene; (i) the fold-change (DS vs. con- 
trols), (ii) the standard error of the fold-change from the 
replicated experiments in that study and (Hi) the expres- 
sion p-value (presence-call) that indicates whether or 
not the gene is expressed in the target samples under 
study. For RT-PCR and MALDI experiments we com- 
puted the fold-change of the mean expression (DS vs. 
controls) as well as the reported standard error of the 
ratio. When mean and standard variation for each group 
(DS and controls) was provided we calculated the ratios 
as well as their associated standard errors. 

Scoring DS dosage effects across studies 

In order to score the different categories of information 
such as binary counts and quantitative gene expression 
values, we summarized the scores of the individual 
experiments for each category. For microarray studies, 
the score of the z'-th gene in the y-th study, Sij, was com- 
puted as described in Rasche et al. [26]: 



ft < 0.1 



eij/nj < 1 



0 else 



Here r^ is the fold-change, is the average detection p- 
value and e^ is the standard error of the ratio derived from 
the experimental replicates of the study. Thus, the fold- 
change is weighted with its reproducibility across the 
experimental replicates and with the likelihood of the gene 
being expressed in the study's case or control samples. 

For RT-PCR and MALDI studies we applied the fol- 
lowing equation: 



|log 2 ( r u)l * 
0 else 



1 



eu/ru < 1 



Here r^ is the fold-change and is the standard error 
of the ratio. 
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The total score of the gene was computed as the sum 
across all individual study scores. 



Enrichment Analysis (GSEA, [41]) against the complete 
list of 19,388 genes ranked by score. 



Sampling for significance 

In order to assess the significance of the overall gene 
scores we generated random scores by re-sampling 
the scores 50,000 times with replacement within the 
same study. Using the random distribution as back- 
ground we assigned as significant those genes that 
were above the 99.9 percentile of the background 
distribution. 

Judging consistency of dosage effects 

For each gene, entropy of the score distribution was 
computed in order to quantify the relevance of the gene 
across many experiments. Let Sij be the score of the ith 
gene in the jth study, then Ei is a measure for the uni- 
formity of the score distribution over the individual 
experiments: 



Additional material 



fa 



!og 2 



. E«ft 



sy > 0 



k 

o, 



High entropy is assigned to a gene if many experi- 
ments contribute to the overall score whereas low 
entropy is assigned if only a few experiments contribute 
to the overall score. 

Enrichment analysis 

Gene Set Enrichment Analysis (GSEA, [41]) of the 324 
genes was performed with respect to pre-defined human 
pathways agglomerated from 22 pathway resources from 
the ConsensusPathDB ([40], http://cpdb.molgen.mpg.de. 
Over-representation analysis of TF target sets was per- 
formed with Fisher's test based on annotation from 
TRANSFAC [42]. Motif enrichment analyses were per- 
formed using AMADEUS [43] with significant genes as 
target sets and all the genes considered in the meta-ana- 
lysis as background set. 

Selection of independent brain experiments 

In order to proof general brain relevance of the 324 
genes, we collected DS-independent gene expression 
studies to decipher brain features, performed with Affy- 
metrix technology and, with experiments deposited in 
GEO or ArrayExpress (Additional file 1, Table S4). 
Mostly, these experiments were performed in mouse 
tissues. For each study we collected one or more result- 
ing gene lists that were evaluated using Gene Set 



Additional file 1: Supplementary tables. Table S1. Data sources used 
for meta-analysis. Table S2. The 324 candidate genes detected in the 
meta-analysis study. Table S3. Transcription factors and associated GO 
terms. Table S4. Cross-validation studies. Table S5. Functional annotation 
of novel candidates. Table S6. Fold-changes and qualitative data. Table 
S7. Human and mouse data overlap. Table S8. DS genes derived from 
meta-analysis of human data. Table S9. DS genes derived from meta- 
analysis of mouse data. 

Additional file 2: Figure SI. Enrichment of GO categories for organ, 
tissue and cell development with respect to the significant HSA21 genes 
(red bars), the significant non-HSA21 genes (green bars) and the non- 
significant genes (blue bars). 

Additional file 3: Figure S2. Genomic location of DS dosage effects in 
A) human B) mouse. Significant genes are marked in red, non-significant 
genes in white. 

Additional file 4: Figure S3. A) Categorization of the 35 qualitative 
studies, B) Categorization of the 34 quantitative studies. C) Venn diagram 
of dosage effects detected with mouse and human data alone and with 
the combination of all data, D) correlation between average PCR and 
microarray values for the detected 324 dosage effects. 

Additional file 5: Figure S4. Cross-validation with DS dosage effects 
detected with an HSA21 microarray [54], Box-plots of meta-analysis 
scores (Y-axis) for class I and II (dosage effects) and class III and IV 
(compensation and variable expression) genes as judged by the authors. 
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